NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adaptive Draft-Verification for Efficient Large Language Model Decoding

https://doi.org/10.1609/aaai.v39i23.34647

Liu, Xukun; Lei, Bowen; Zhang, Ruqi; Xu, Dongkuan DK (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires a separate forward pass through the model for each token generated, which is computationally inefficient and poses challenges for deploying LLMs in latency-sensitive scenarios.The main limitations of current decoding methods stem from their inefficiencies and resource demands. Existing approaches either necessitate fine-tuning smaller models, which is resource-intensive, or relying on fixed retrieval schemes to construct drafts for the next tokens, which lack adaptability and fail to generalize across different models and contexts.To address these issues, we introduce a novel methodology called Adaptix, which accelerates LLM decoding without requiring fine-tuning. Our approach involves an adaptive draft-verification process that evolves over time to improve efficiency. We utilize a tri-gram matrix-based LLM representation to dynamically approximate the output distribution of the LLM, allowing the model to adjust to changing token probabilities during the decoding process. Additionally, we implement a draft construction mechanism that effectively balances exploration and exploitation, ensuring that the drafts generated are both diverse and close to the true output distribution of the LLM.The importance of this design lies in its ability to optimize the draft distribution adaptively, leading to faster and more accurate decoding. Through extensive experiments on various benchmark datasets and LLM architectures, we demonstrate that Adaptix significantly accelerates the decoding process while maintaining high accuracy, making it suitable for deployment in a wide range of practical applications.
more » « less
Full Text Available
Bayesian optimization with adaptive surrogate models for automated experimental design

https://doi.org/10.1038/s41524-021-00662-x

Lei, Bowen; Kirk, Tanner Quinn; Bhattacharya, Anirban; Pati, Debdeep; Qian, Xiaoning; Arroyave, Raymundo; Mallick, Bani K. (December 2021, npj Computational Materials)

Abstract Bayesian optimization (BO) is an indispensable tool to optimize objective functions that either do not have known functional forms or are expensive to evaluate. Currently, optimal experimental design is always conducted within the workflow of BO leading to more efficient exploration of the design space compared to traditional strategies. This can have a significant impact on modern scientific discovery, in particular autonomous materials discovery, which can be viewed as an optimization problem aimed at looking for the maximum (or minimum) point for the desired materials properties. The performance of BO-based experimental design depends not only on the adopted acquisition function but also on the surrogate models that help to approximate underlying objective functions. In this paper, we propose a fully autonomous experimental design framework that uses more adaptive and flexible Bayesian surrogate models in a BO procedure, namely Bayesian multivariate adaptive regression splines and Bayesian additive regression trees. They can overcome the weaknesses of widely used Gaussian process-based methods when faced with relatively high-dimensional design space or non-smooth patterns of objective functions. Both simulation studies and real-world materials science case studies demonstrate their enhanced search efficiency and robustness.
more » « less
Full Text Available
Estimation of COVID-19 spread curves integrating global data and borrowing information

https://doi.org/10.1371/journal.pone.0236860

Lee, Se Yoon; Lei, Bowen; Mallick, Bani; Samy, Abdallah M. (July 2020, PLOS ONE)

Full Text Available

Search for: All records